48 research outputs found
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
Parallel In-Memory Evaluation of Spatial Joins
The spatial join is a popular operation in spatial database systems and its
evaluation is a well-studied problem. As main memories become bigger and faster
and commodity hardware supports parallel processing, there is a need to revamp
classic join algorithms which have been designed for I/O-bound processing. In
view of this, we study the in-memory and parallel evaluation of spatial joins,
by re-designing a classic partitioning-based algorithm to consider alternative
approaches for space partitioning. Our study shows that, compared to a
straightforward implementation of the algorithm, our tuning can improve
performance significantly. We also show how to select appropriate partitioning
parameters based on data statistics, in order to tune the algorithm for the
given join inputs. Our parallel implementation scales gracefully with the
number of threads reducing the cost of the join to at most one second even for
join inputs with tens of millions of rectangles.Comment: Extended version of the SIGSPATIAL'19 paper under the same titl
Two-layer Space-oriented Partitioning for Non-point Data
Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiquitous.
We study the problem of indexing non-point objects in memory for range queries
and spatial intersection joins. We propose a secondary partitioning technique
for space-oriented partitioning indices (e.g., grids), which improves their
performance significantly, by avoiding the generation and elimination of
duplicate results. Our approach is easy to implement and can be used by any
space-partitioning index to significantly reduce the cost of range queries and
intersection joins. In addition, the secondary partitions can be processed
independently, which makes our method appropriate for distributed and parallel
indexing. Experiments on real datasets confirm the advantage of our approach
against alternative duplicate elimination techniques and data-oriented
state-of-the-art spatial indices. We also show that our partitioning technique,
paired with optimized partition-to-partition join algorithms, typically reduces
the cost of spatial joins by around 50%.Comment: To appear in the IEEE Transactions on Knowledge and Data Engineerin
On-Line Discovery of Hot Motion Paths
We consider an environment of numerous moving objects, equipped with location-sensing devices and capable of communicating with a central coordinator. In this setting, we investigate the problem of maintaining hot motion paths, i.e., routes frequently followed by multiple objects over the recent past. Motion paths approximate portions of objects' movement within a tolerance margin that depends on the uncertainty inherent in positional measurements. Discovery of hot motion paths is important to applications requiring classification/profiling based on monitored movement patterns, such as targeted advertising, resource allocation, etc. To achieve this goal, we delegate part of the path extraction process to objects, by assigning to them adaptive lightweight filters that dynamically suppress unnecessary location updates and, thus, help reducing the communication overhead. We demonstrate the benefits of our methods and their efficiency through extensive experiments on synthetic data sets
Privacy-Preserving Release of Spatio-temporal Density
International audienceIn today’s digital society, increasing amounts of contextually rich spatio-temporal information are collected and used, e.g., for knowledge-based decision making, research purposes, optimizing operational phases of city management, planning infrastructure networks, or developing timetables for public transportation with an increasingly autonomous vehicle fleet. At the same time, however, publishing or sharing spatio-temporal data, even in aggregated form, is not always viable owing to the danger of violating individuals’ privacy, along with the related legal and ethical repercussions. In this chapter, we review some fundamental approaches for anonymizing and releasing spatio-temporal density, i.e., the number of individuals visiting a given set of locations as a function of time. These approaches follow different privacy models providing different privacy guarantees as well as accuracy of the released anonymized data. We demonstrate some sanitization (anonymization) techniques with provable privacy guarantees by releasing the spatio-temporal density of Paris, in France. We conclude that, in order to achieve meaningful accuracy, the sanitization process has to be carefully customized to the application and public characteristics of the spatio-temporal data
Privacy Preservation in the Dissemination of Location Data
The rapid advance in handheld communication devices and the appearance of smartphones has allowed users to connect to the Internet and surf on the WWW while they are moving around the city or traveling. Location based services have been developed to deliver content that is adjusted to the current user location. Social networks have also responded to the challenge of users who can access the Internet from any place in the city, and location based social-networks like Foursquare have become very popular in a short period of time. The popularity of these applications is linked to the significant advantages they offer: users can exploit live location-based information to take dynamic decisions on issues like transportation, identification of places of interest or even on the opportunity to meet a friend or an associate in nearby locations. A side effect of sharing location-based information is that it exposes the user to substantial privacy related threats. Revealing the user’s location carelessly can prove to be embarrassing, harmful professionally, or even dangerous. Research in the data management field has put significant effort on anonymization techniques that obfuscate spatial information in order to hide the identity of the user or her exact location. Privacy guaranties and anonymization algorithms become increasingly sophisticated offering better and more efficient protection in data publishing and data exchange. Still, it is not clear yet what are the greatest dangers to user privacy and which are the most realistic privacy breaching scenarios. The aim of the paper is to provide a brief survey of the attack scenarios, the privacy guaranties and the data transformations employed to protect user privacy in real time. The paper focuses mostly on providing an overview of the privacy models that are investigated in literature and less on the algorithms and their scaling capabilities. The models and the attack scenarios are classified and compared, in order to provide an overview of the cases that are covered by existing research. 1